The Effect of Missing Data on Classification Quality
نویسندگان
چکیده
The field of data quality management has long recognized the negative impact of data quality defects on decision quality. In many decision scenarios, this negative impact can be largely attributed to the mediating role played by decision-support models with defected data, the estimation of such a model becomes less reliable and, as a result, the likelihood of flawed decisions increases. Drawing on that argument, this study presents a methodology for assessing the impact of quality defects on the likelihood of flawed decisions. The methodology is first presented at a high level, and then extended for analyzing the impact of missing values on binary Linear Discriminant Analysis (LDA) classifiers. To conclude, we discuss possible directions for extensions and future directions.
منابع مشابه
Missing data imputation in multivariable time series data
Multivariate time series data are found in a variety of fields such as bioinformatics, biology, genetics, astronomy, geography and finance. Many time series datasets contain missing data. Multivariate time series missing data imputation is a challenging topic and needs to be carefully considered before learning or predicting time series. Frequent researches have been done on the use of diffe...
متن کاملInvestigating the missing data effect on credit scoring rule based models: The case of an Iranian bank
Credit risk management is a process in which banks estimate probability of default (PD) for each loan applicant. Data sets of previous loan applicants are built by gathering their data, and these internal data sets are usually completed using external credit bureau’s data and finally used for estimating PD in banks. There is also a continuous interest for bank to use rule based classifiers to b...
متن کاملA MODEL FOR MIXED CONTINUOUS AND DISCRETE RESPONSES WITH POSSIBILITY OF MISSING RESPONSES
A model for missing data in mixed binary and continuous responses, which can be used on cross-sectional data, is presented. In this model response indicator for the binary response can be dependent on the continuous response. A closed form for the likelihood is found. For data with a complicated pattern of missing responses some new residuals are also proposed. The model of multiplicative heter...
متن کاملInfluence of Pattern of Missing Data on Performance of Imputation Methods: An Example from National Data on Drug Injection in Prisons
Background Policy makers need models to be able to detect groups at high risk of HIV infection. Incomplete records and dirty data are frequently seen in national data sets. Presence of missing data challenges the practice of model development. Several studies suggested that performance of imputation methods is acceptable when missing rate is moderate. One of the issues which was of less concern...
متن کاملچند رویکرد برخورد با مقادیر گمشده متغیرهای کمی و بررسی اثر آنها بر نتایج حاصل از یک کارآزمایی بالینی
Background and Objectives: A major challenge that affects the longitudinal studies is the problem of missing data. Missing in the data may result in the loss of part of the information which reduces the accuracy of the estimator and obtain the results will be biased and inaccurate. Therefore, it is necessary to evaluate the missing data mechanism from a longitudinal research and to consider thi...
متن کاملA Comprehensive Method of Evaluating Open Government Data with the Aim of Improving Data Quality and Increasing Citizens' Willingness
Purpose: The purpose is to present an open government data evaluation method by considering comprehensive and complete dimensions and criteria - calculating the weight and importance of each criterion, examining the country in this area, clustering organizations and presenting a classification model to predict the situation. Methodology: Library studies was used to extract the dimensions and cr...
متن کامل